Parallelized Boosting with Map-Reduce
ثبت نشده
چکیده
Due to the recent overwhelming growth rate of large-scale data, the development of faster processing algorithms with optimal performance has become a dire need of the time. In this paper, we propose two novel algorithms, ADABOOST.PL (Parallel ADABOOST) and LOGITBOOST.PL (Parallel LOGITBOOST), that facilitate simultaneous participation of multiple computing nodes to construct a boosted classifier. Our algorithms can induce boosted models whose generalization performance is close to the respective baseline classifier. By exploiting their own parallel architecture both the algorithms gain significant speedup. Moreover, the algorithms do not require individual computing nodes to communicate with each other, to share their data or to share the knowledge derived from their data and hence, they are robust in preserving privacy of computation as well. We used the Map-Reduce framework to implement our algorithms and experimented on a variety of synthetic and real-world data sets to demonstrate the performance in terms of classification accuracy, speedup and scaleup. Keywords-Boosting; parallel algorithms; classification; distributed computing.
منابع مشابه
Parallelizing Boosting and Bagging
Bagging and boosting are two general techniques for building predictors based on small samples from a dataset. We show that boosting can be parallelized, and then present performance results for parallelized bagging and boosting using OC1 decision trees and two standard datasets. The main results are that sample sizes limit achievable accuracy, regardless of computational time spent; that paral...
متن کاملMap-Reduce Parallelization of Motif Discovery
Motif discovery is one of the most challenging problems in bioinformatics today. DNA sequence motifs are becoming increasingly important in analysis of gene regulation. Motifs are short, recurring patterns in DNA that have a biological function. For example, they indicate binding sites for Transcription Factors (TFs) and nucleases. There are a number of Motif Discovery algorithms that run seque...
متن کاملAlgorithms and hardness results for parallel large margin learning
We consider the problem of learning an unknown large-margin halfspace in the context of parallel computation, giving both positive and negative results. As our main positive result, we give a parallel algorithm for learning a large-margin halfspace, based on an algorithm of Nesterov’s that performs gradient descent with a momentum term. We show that this algorithm can learn an unknown γ-margin ...
متن کاملParallelizing Wavelet Transformation with a Reduction Style Framework
There is an increasing trend towards parallel processing and data processing using cloud environments. Simple APIs for expressing parallelism, such as map-reduce, are popular in cloud environments, but their expressibility is generally considered limited. This paper focuses on how a signal processing algorithm, wavelet transform, can be parallelized in a cloud environment comprising a cluster o...
متن کامل